Sense and Reference Disambiguation in Wikipedia

نویسندگان

  • Hui Shen
  • Razvan C. Bunescu
  • Rada Mihalcea
چکیده

Wikipedia articles are annotated by volunteer contributors with numerous links that connect words and phrases to relevant titles in Wikipedia. In this paper, we identify inconsistencies in the user annotation of links and show that they can have a substantial impact on the performance of word sense disambiguation systems that are trained on Wikipedia links. We describe two major types of link annotations – sense and reference – that are frequently used without being explicitly distinguished in Wikipedia, and present an approach to training sense and reference disambiguation systems in the presence of such annotation inconsistencies. Experimental results demonstrate that accounting for annotation ambiguity in Wikipedia links leads to significant improvements in disambiguation accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation Using Wikipedia

This paper describes explorations in word sense disambiguation using Wikipedia as a source of sense annotations. Through experiments on four different languages, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.

متن کامل

Using Wikipedia for Automatic Word Sense Disambiguation

This paper describes a method for generating sense-tagged data using Wikipedia as a source of sense annotations. Through word sense disambiguation experiments, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.

متن کامل

Improving Wikipedia Miner Word Sense Disambiguation Algorithm

This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional features, the disam...

متن کامل

Automatic Identification and Disambiguation of Concepts and Named Entities in the Multilingual Wikipedia

In this paper we present an automatic multilingual annotation of the Wikipedia dumps in two languages, with both word senses (i.e. concepts) and named entities. We use Babelfy 1.0, a state-of-the-art multilingual Word Sense Disambiguation and Entity Linking system. As its reference inventory, Babelfy draws upon BabelNet 3.0, a very large multilingual encyclopedic dictionary and semantic network...

متن کامل

Wikipedia Mining for Triple Extraction Enhanced by Co-reference Resolution

Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia’s impressive characteristics are not limited to the scale, but also include the dense link structure...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012